EntityBases: Compiling, Organizing and Querying Massive Entity Repositories
نویسندگان
چکیده
The current approaches for linking information across sources, often called record linkage, require finding common attributes between the sources and comparing the records using those attributes. This often leads to unsatisfactory results because the sources are often missing information or contain incorrect or outdated information. We are addressing this problem by developing the technology to build massive entity knowledgebases, which we call EntityBases. The key idea is to create a comprehensive knowledgebase for the entities of interest (e.g., companies). In order to build such a knowledge base, we must address the issues of linking entities with multi-valued attributes obtained from heterogeneous sources and providing a virtual repository that can be efficiently queried. This paper describes how we have addressed these issues and shows how an EntityBaseTM can be used for understanding and linking text documents.
منابع مشابه
XQuery Evaluation with Relevance Ranking in Structured Peer-to-Peer Systems
This paper addresses the problem of publishing, indexing, and querying large XML data repositories distributed over an existing peer-to-peer (P2P) service infrastructure. Our architecture scales gracefully to the network and data sizes by supporting thousands of nodes, massive data, and frequent queries and updates. It is fully distributed, fault tolerant and self-organizing, and handles comple...
متن کاملAnalysis and design of approximate queries over XML documents using statistical techniques
In the last few years several repositories for storing XML documents and languages for querying XML data have been studied and implemented. All the query languages proposed so far allow to obtain exact answers, but when applied to large XML repositories or warehouses, such precise queries may require high response times. To overcome this problem, in traditional relational warehouses fast approx...
متن کاملQuery-By-Keywords (QBK): Query Formulation Using Semantics and Feedback
The staples of information retrieval have been querying and search, respectively, for structured and unstructured repositories. Processing queries over known, structured repositories (e.g., Databases) has been well-understood, and search has become ubiquitous when it comes to unstructured repositories (e.g., Web). Furthermore, searching structured repositories has been explored to a limited ext...
متن کاملDBGlobe: A Data-Centric Approach to Global Computing
In the near future, there will be increasingly powerful computers in smart cards, telephones, and other information appliances. This will create a massive infrastructure composed of highly diverse interconnected mobile entities. In this paper, we present a data-centric approach to storage and querying in such environments. At a first level, we view each entity as a miniature database; at a seco...
متن کاملA Native Extensible XML Query Processor Towards Efficient and Effective MPEG-7 Querying
In recent years the production of massive amounts of visual information has led to the arrival of very large multimedia Digital Libraries (DLs). The key to support efficient search and management operations in such repositories is to exploit metadata information for digital media, such as MPEG7 [4] based ones, which seem to be the most widely accepted. The underlying XML syntax, together with t...
متن کامل